Mining time-series data using discriminative subsequences

نویسنده

  • Jonathan F. F. Hills
چکیده

Time-series data is abundant, and must be analysed to extract usable knowledge.Local-shape-based methods offer improved performance for many problems, and acomprehensible method of understanding both data and models.For time-series classification, we transform the data into a local-shape space usinga shapelet transform. A shapelet is a time-series subsequence that is discriminativeof the class of the original series. We use a heterogeneous ensemble classifier onthe transformed data. The accuracy of our method is significantly better than thetime-series classification benchmark (1-nearest-neighbour with dynamic time-warpingdistance), and significantly better than the previous best shapelet-based classifiers.We use two methods to increase interpretability: first, we cluster the shapelets usinga novel, parameterless clustering method based on Minimum Description Length,reducing dimensionality and removing duplicate shapelets. Second, we transformthe shapelet data into binary data reflecting the presence or absence of particularshapelets, a representation that is straightforward to interpret and understand.We supplement the ensemble classifier with partial classification. We generaterule sets on the binary-shapelet data, improving performance on certain classes, andrevealing the relationship between the shapelets and the class label. To aid inter-pretability, we use a novel algorithm, BruteSuppression, that can substantially re-duce the size of a rule set without negatively affecting performance, leading to a morecompact, comprehensible model.Finally, we propose three novel algorithms for unsupervised mining of approxi-mately repeated patterns in time-series data, testing their performance in terms ofspeed and accuracy on synthetic data, and on a real-world electricity-consumptiondevice-disambiguation problem. We show that individual devices can be found auto-matically and in an unsupervised manner using a local-shape-based approach.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Preventing Meaningless Stock Time Series Pattern Discovery by Changing Perceptually Important Point Detection

Discovery of interesting or frequently appearing time series patterns is one of the important tasks in various time series data mining applications. However, recent research criticized that discovering subsequence patterns in time series using clustering approaches is meaningless. It is due to the presence of trivial matched subsequences in the formation of the time series subsequences using sl...

متن کامل

Pattern Discovery for Locating Motifs in Multivariate, Real-valued Time-series Data

The problem of locating motifs in multivariate, real-valued time series data concerns the discovery of sets of recurring patterns embedded in the time series. Each set is composed of several nonoverlapping subsequences and constitutes a motif because all of the subsequences are similar. This task is a natural extension of univariate motif discovery in both the symbolic and real-valued domains a...

متن کامل

RPM: Representative Pattern Mining for Efficient Time Series Classification

Time series classification is an important problem that has received a great amount of attention by researchers and practitioners in the past two decades. In this work, we propose a novel algorithm for time series classification based on the discovery of class-specific representative patterns. We define representative patterns of a class as a set of subsequences that has the greatest discrimina...

متن کامل

Clustering Unsynchronized Time Series Subsequences with Phase Shift Weighted Spherical k-means Algorithm

Time series have become an important class of temporal data objects in our daily life while clustering analysis is an effective tool in the fields of data mining. However, the validity of clustering time series subsequences has been thrown into doubts recently by Keogh et al. In this work, we review this problem and propose the phase shift weighted spherical k-means algorithm (PS-WSKM in abbrev...

متن کامل

Mining Biological Repetitive Sequences Using Support Vector Machines and Fuzzy SVM

Structural repetitive subsequences are most important portion of biological sequences, which play crucial roles on corresponding sequence’s fold and functionality. Biggest class of the repetitive subsequences is “Transposable Elements” which has its own sub-classes upon contexts’ structures. Many researches have been performed to criticality determine the structure and function of repetitiv...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014